Boosting with Early Stopping: Convergence and Consistency
نویسندگان
چکیده
Boosting is one of the most significant advances in machine learning for classification and regression. In its original and computationally flexible version, boosting seeks to minimize empirically a loss function in a greedy fashion. The resulted estimator takes an additive function form and is built iteratively by applying a base estimator (or learner) to updated samples depending on the previous iterations. An unusual regularization technique, early stopping, is employed based on CV or a test set. This paper studies numerical convergence, consistency, and statistical rates of convergence of boosting with early stopping, when it is carried out over the linear span of a family of basis functions. For general loss functions, we prove the convergence of boosting’s greedy optimization to the infinimum of the loss function over the linear span. Using the numerical convergence result, we find early stopping strategies under which boosting is shown to be consistent based on iid samples, and we obtain bounds on the rates of convergence for boosting estimators. Simulation studies are also presented to illustrate the relevance of our theoretical results for providing insights to practical aspects of boosting. As a side product, these results also reveal the importance of restricting the greedy search step sizes, as known in practice through the works of Friedman and others. Moreover, our results lead to a rigorous proof that for a linearly separable problem, AdaBoost with ǫ → 0 stepsize becomes an L-margin maximizer when left to run to convergence.
منابع مشابه
High-Dimensional $L_2$Boosting: Rate of Convergence
Boosting is one of the most significant developments in machine learning. This paper studies the rate of convergence of L2Boosting, which is tailored for regression, in a high-dimensional setting. Moreover, we introduce so-called “post-Boosting”. This is a post-selection estimator which applies ordinary least squares to the variables selected in the first stage by L2Boosting. Another variant is...
متن کاملOptimization by gradient boosting
Gradient boosting is a state-of-the-art prediction technique that sequentially produces a model in the form of linear combinations of simple predictors—typically decision trees—by solving an infinite-dimensional convex optimization problem. We provide in the present paper a thorough analysis of two widespread versions of gradient boosting, and introduce a general framework for studying these al...
متن کاملEarly stopping for kernel boosting algorithms: A general analysis with localized complexities
Early stopping of iterative algorithms is a widely-used form of regularization in statistics, commonly used in conjunction with boosting and related gradient-type algorithms. Although consistency results have been established in some settings, such estimators are less well-understood than their analogues based on penalized regularization. In this paper, for a relatively broad class of loss func...
متن کاملOn Early Stopping in Gradient Descent Boosting
In this paper, we study a family of gradient descent algorithms to approximate the regression function from reproducing kernel Hilbert spaces. Here early stopping plays a role of regularization, where given a finite sample and some regularity condition on the regression function, a stopping rule is given and some probabilistic upper bounds are obtained for the distance between the function iter...
متن کاملOn Early Stopping in Gradient Descent Learning
In this paper, we study a family of gradient descent algorithms to approximate the regression function from Reproducing Kernel Hilbert Spaces (RKHSs), the family being characterized by a polynomial decreasing rate of step sizes (or learning rate). By solving a bias-variance trade-off we obtain an early stopping rule and some probabilistic upper bounds for the convergence of the algorithms. Thes...
متن کامل